Video Game Sales Analysis

  • Created by Andrés Segura Tinoco
  • Created on Mar 17, 2020

Visual Analytics project to analyze and discovery insights of video game sales in recent years, with the high-level API plotly.express.

In [1]:
# Load the Pandas libraries
import pandas as pd
import numpy as np
In [2]:
# Load Plot libraries
import matplotlib.pyplot as plt
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots

Loading raw data

The first step is to load the dataset into a pandas dataframe. We will only work with data before 2018, to have a more accurate analysis.

In [3]:
dataURL = "../data/vgsales.csv"
raw_data = pd.read_csv(dataURL)
#raw_data = raw_data.query("Year < 2018")
In [4]:
raw_data
Out[4]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
... ... ... ... ... ... ... ... ... ... ... ...
16593 16596 Woody Woodpecker in Crazy Castle 5 GBA 2002.0 Platform Kemco 0.01 0.00 0.00 0.00 0.01
16594 16597 Men in Black II: Alien Escape GC 2003.0 Shooter Infogrames 0.01 0.00 0.00 0.00 0.01
16595 16598 SCORE International Baja 1000: The Official Game PS2 2008.0 Racing Activision 0.00 0.00 0.00 0.00 0.01
16596 16599 Know How 2 DS 2010.0 Puzzle 7G//AMES 0.00 0.01 0.00 0.00 0.01
16597 16600 Spirits & Spells GBA 2003.0 Platform Wanadoo 0.01 0.00 0.00 0.00 0.01

16598 rows × 11 columns

Now the basic statistics of the numeric fields are shown, to have a quick understanding of the behavior of the data.

In [5]:
raw_data[["NA_Sales", "EU_Sales", "JP_Sales", "Other_Sales", "Global_Sales"]].describe()
Out[5]:
NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
count 16598.000000 16598.000000 16598.000000 16598.000000 16598.000000
mean 0.264667 0.146652 0.077782 0.048063 0.537441
std 0.816683 0.505351 0.309291 0.188588 1.555028
min 0.000000 0.000000 0.000000 0.000000 0.010000
25% 0.000000 0.000000 0.000000 0.000000 0.060000
50% 0.080000 0.020000 0.000000 0.010000 0.170000
75% 0.240000 0.110000 0.040000 0.040000 0.470000
max 41.490000 29.020000 10.220000 10.570000 82.740000

1. Video Games Sales per Year

In [6]:
# Total sales ($M)
gd_sales = raw_data.groupby(["Year"]).sum()
gd_sales.reset_index(inplace=True)
In [7]:
# Plot global trend
fig = px.line(gd_sales, x="Year", y="Global_Sales")
fig.add_shape(dict(type="line", x0=2008, y0=0, x1=2008, y1=700, line=dict(color="RoyalBlue", width=2, dash="dot")))
fig.update_layout(height=400)
fig.update_yaxes(title_text="# Global Sales")
fig.show()

Insights

  • It is clearly observed that the year with the highest sales of video games was 2008. It should also be taken into account that this was the year for which the most records were reported.

2. Top 50 best-selling video games

Now we can plot the top 50 best-selling video games in the world.

In [8]:
# Data
top_games = 50
raw_data.head(10)
Out[8]:
Rank Name Platform Year Genre Publisher NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1 Wii Sports Wii 2006.0 Sports Nintendo 41.49 29.02 3.77 8.46 82.74
1 2 Super Mario Bros. NES 1985.0 Platform Nintendo 29.08 3.58 6.81 0.77 40.24
2 3 Mario Kart Wii Wii 2008.0 Racing Nintendo 15.85 12.88 3.79 3.31 35.82
3 4 Wii Sports Resort Wii 2009.0 Sports Nintendo 15.75 11.01 3.28 2.96 33.00
4 5 Pokemon Red/Pokemon Blue GB 1996.0 Role-Playing Nintendo 11.27 8.89 10.22 1.00 31.37
5 6 Tetris GB 1989.0 Puzzle Nintendo 23.20 2.26 4.22 0.58 30.26
6 7 New Super Mario Bros. DS 2006.0 Platform Nintendo 11.38 9.23 6.50 2.90 30.01
7 8 Wii Play Wii 2006.0 Misc Nintendo 14.03 9.20 2.93 2.85 29.02
8 9 New Super Mario Bros. Wii Wii 2009.0 Platform Nintendo 14.59 7.06 4.70 2.26 28.62
9 10 Duck Hunt NES 1984.0 Shooter Nintendo 26.93 0.63 0.28 0.47 28.31
In [9]:
# Plot the best-selling video games, colored by Publisher
fig = px.bar(raw_data.head(top_games), x = 'Global_Sales', y = 'Name', color='Publisher', orientation='h')
fig.update_layout(yaxis={'categoryorder':'total ascending'}, 
                  showlegend=True)
fig.update_layout(height=800, title_text="Top 50 Best-Selling Video Games")
fig.update_xaxes(title_text="# Global Sales")
fig.update_yaxes(title_text="")
fig.show()

Insights

  • It is clear that the best-selling game is Wii Sports with approximately 80M copies. Keep in mind, that this game came by default with the first Wii console.
  • Also, the best-selling video game company is Nintendo (Royal Blue color).

3. Video Game Sales grouped by Platform

In [10]:
# Grouped data
gd = raw_data.groupby(['Platform', 'Publisher']).sum()
gd.reset_index(inplace=True)
gd.head(10)
Out[10]:
Platform Publisher Rank Year NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 2600 20th Century Fox Video Games 27266 9907.0 1.82 0.10 0.0 0.01 1.94
1 2600 Activision 119222 49571.0 18.17 1.08 0.0 0.20 19.42
2 2600 Answer Software 4018 1982.0 0.46 0.03 0.0 0.01 0.50
3 2600 Atari 224046 83266.0 46.69 2.86 0.0 0.48 50.09
4 2600 Avalon Interactive 8460 1982.0 0.15 0.01 0.0 0.00 0.17
5 2600 Bomb 7151 1982.0 0.21 0.01 0.0 0.00 0.22
6 2600 CBS Electronics 5775 1982.0 0.29 0.02 0.0 0.00 0.31
7 2600 CPG Products 3718 1982.0 0.50 0.03 0.0 0.01 0.54
8 2600 Coleco 21173 9906.0 2.87 0.17 0.0 0.03 3.06
9 2600 Data Age 10629 3963.0 0.66 0.04 0.0 0.00 0.71
In [11]:
# Plot Video Game Sales grouped by Platform
fig = px.treemap(gd, path=['Platform', 'Publisher'], values='Global_Sales')
fig.show()

Insights

  • The 4 platforms that have contributed the most to video game sales are: PS2, PS3, X360 and Wii.
  • Closely followed by Nintendo DS and PS (1).

4. Video Game Sales grouped by Publisher

In [12]:
# Plot Video Game Sales grouped by Publisher
fig = px.treemap(gd, path=['Publisher', 'Platform'], values='Global_Sales')
fig.show()

Insights

  • Regarding the biggest publishers of video games, Nintendo clearly wins.
  • Other great cross-platform publishers are: Electronic Arts, Activision and Ubisoft.
In [13]:
top_companies = 10
In [14]:
# Top 10 Companies
gd = raw_data.groupby(['Publisher']).sum()
gd = gd.sort_values(by='Global_Sales', ascending=False)
top_companies = list(gd.head(top_companies).index)
top_companies
Out[14]:
['Nintendo',
 'Electronic Arts',
 'Activision',
 'Sony Computer Entertainment',
 'Ubisoft',
 'Take-Two Interactive',
 'THQ',
 'Konami Digital Entertainment',
 'Sega',
 'Namco Bandai Games']
In [15]:
# Grouped data
gd = raw_data[raw_data["Publisher"].isin(top_companies)].groupby(['Year', 'Publisher']).sum()
gd = gd.sort_values(by='Year', ascending=True)
gd.reset_index(inplace=True)
gd.head(10)
Out[15]:
Year Publisher Rank NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1980.0 Activision 20153 2.82 0.18 0.00 0.03 3.02
1 1981.0 Activision 16815 7.95 0.46 0.00 0.08 8.50
2 1982.0 Activision 19454 1.75 0.11 0.00 0.03 1.86
3 1982.0 Sega 4803 0.37 0.02 0.00 0.00 0.40
4 1983.0 Activision 12903 1.81 0.11 0.00 0.02 1.94
5 1983.0 Nintendo 7404 2.32 0.46 8.10 0.08 10.96
6 1984.0 Namco Bandai Games 5833 0.45 0.14 2.81 0.03 3.43
7 1984.0 Nintendo 8921 32.57 1.95 10.36 0.67 45.56
8 1984.0 Activision 6298 0.26 0.01 0.00 0.00 0.27
9 1985.0 Activision 18677 0.42 0.06 0.00 0.01 0.48

Plotting Sales Trends of Top 10 Publishers

In [16]:
fig = px.line(gd, x="Year", y="Global_Sales", color='Publisher')
fig.update_yaxes(title_text="# Global Sales")
fig.show()

This multi-line chart confirms the insights obtained in point 4.

6. Distribution of Video Game Sales

Regarding Platform and Genre from 2013.

In [17]:
# Parallel Categories Diagram
fig = px.parallel_categories(raw_data.query('Year >= 2013'), dimensions=['Platform', 'Genre'])
fig.show()

Insights

  • As of 2013, the majority of video games sold were of the genres: Action, Role-Playing (RPG) and Adventure, closely followed by the genres: Sports and Shooter.
  • Regarding the platforms that sold the most video games as of 2013, PlayStation (PS3, PS4 and PSV) is the clear winner, contributing approximately 40% of video games sold.
  • The participation of games by gender is practically distributed uniformly by platform.

7. Evolution of Video Game sales by Genre

In [18]:
# Top 10 Companies
gd = raw_data.groupby(['Year', 'Genre']).sum()
gd.reset_index(inplace=True)
gd
Out[18]:
Year Genre Rank NA_Sales EU_Sales JP_Sales Other_Sales Global_Sales
0 1980.0 Action 5368 0.32 0.02 0.00 0.00 0.34
1 1980.0 Fighting 2671 0.72 0.04 0.00 0.01 0.77
2 1980.0 Misc 16956 2.53 0.15 0.00 0.02 2.71
3 1980.0 Shooter 804 6.56 0.43 0.00 0.08 7.07
4 1980.0 Sports 4027 0.46 0.03 0.00 0.01 0.49
... ... ... ... ... ... ... ... ...
384 2016.0 Sports 360626 4.57 7.36 0.78 1.92 14.60
385 2016.0 Strategy 138736 0.11 0.32 0.05 0.04 0.50
386 2017.0 Action 16441 0.00 0.00 0.01 0.00 0.01
387 2017.0 Role-Playing 30637 0.00 0.00 0.04 0.00 0.04
388 2020.0 Simulation 5959 0.27 0.00 0.00 0.02 0.29

389 rows × 8 columns

In [19]:
# Parallel Categories Diagram
fig = px.area(raw_data, x='Year', y='Global_Sales', color='Genre')
fig.update_yaxes(title_text="# Global Sales")
fig.show()

Videogame Sales by Genre for Decades

In [20]:
# Cook the data
gd_1980s = raw_data.query("Year>=1980 and Year<1990")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_1980s.reset_index(inplace=True)
gd_1990s = raw_data.query("Year>=1900 and Year<2000")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_1990s.reset_index(inplace=True)
gd_2000s = raw_data.query("Year>=2000 and Year<2010")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_2000s.reset_index(inplace=True)
gd_2010s = raw_data.query("Year>=2010 and Year<2020")[["Genre", "Global_Sales"]].groupby(['Genre']).sum()
gd_2010s.reset_index(inplace=True)
In [21]:
# Create subplots: use 'domain' type for Pie subplot
labels = ["US", "China", "European Union", "Russian Federation", "Brazil", "India", "Rest of World"]
fig = make_subplots(rows=2, cols=2, specs=[[{'type':'domain'}, {'type':'domain'}], [{'type':'domain'}, {'type':'domain'}]])
fig.add_trace(go.Pie(labels=gd_1980s["Genre"], values=gd_1980s["Global_Sales"], name="1980s", title="1980s"), 1, 1)
fig.add_trace(go.Pie(labels=gd_1990s["Genre"], values=gd_1990s["Global_Sales"], name="1990s", title="1990s"), 1, 2)
fig.add_trace(go.Pie(labels=gd_2000s["Genre"], values=gd_2000s["Global_Sales"], name="2000s", title="2000s"), 2, 1)
fig.add_trace(go.Pie(labels=gd_2010s["Genre"], values=gd_2010s["Global_Sales"], name="2010s", title="2010s"), 2, 2)
fig.update_layout(height=800)
fig.show()

Insights

It is clear how the video game genre trend has changed in each decade:

  • In the 1980s Platform games were the most played, with 32.5%, now they only represent 4.79%.
  • Instead, the Action genre is the most famous now with a 26.7% share.
  • The 1990s was one of the most balanced, where RPG games had more prominence (12.1%).